The minimal important difference of patient-reported outcome measures related to female urinary incontinence: a systematic review

Background The minimal important difference is a valuable metric in ascertaining the clinical relevance of a treatment, offering valuable guidance in patient management. There is a lack of available evidence concerning this metric in the context of outcomes related to female urinary incontinence, which might negatively impact clinical decision-making. Objectives To summarize the minimal important difference of patient-reported outcome measures associated with urinary incontinence, calculated according to both distribution- and anchor-based methods. Methods This is a systematic review conducted according to the PRISMA guidelines. The search strategy including the main terms for urinary incontinence and minimal important difference were used in five different databases (Medline, Embase, CINAHL, Web of Science, and Scopus) in 09 June 2021 and were updated in January 09, 2024 with no limits for date, language or publication status. Studies that provided minimal important difference (distribution- or anchor-based methods) for patient-reported outcome measures related to female urinary incontinence outcomes were included. The study selection and data extraction were performed independently by two different researchers. Only studies that reported the minimal important difference according to anchor-based methods were assessed by credibility and certainty of the evidence. When possible, absolute minimal important differences were calculated for each study separately according to the mean change of the group of participants that slightly improved. Results Twelve studies were included. Thirteen questionnaires with their respective minimal important differences reported according to distribution (effect size, standard error of measurement, standardized response mean) and anchor-based methods were found. Most of the measures for anchor methods did not consider the smallest difference identified by the participants to calculate the minimal important difference. All reports related to anchor-based methods presented low credibility and very low certainty of the evidence. We pooled 20 different estimates of minimal important differences using data from primary studies, considering different anchors and questionnaires. Conclusions There is a high variability around the minimal important difference related to patient-reported outcome measures for urinary incontinence outcomes according to the method of analysis, questionnaires, and anchors used, however, the credibility and certainty of the evidence to support these is still limited. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-024-02188-4.


Introduction
The International Continence Society defines urinary incontinence as any loss of urine [1].Stress urinary incontinence has been defined as urine loss associated with coughing, sneezing, exertion, or physical exertion; while urgent urinary incontinence is defined as loss of urine associated with urinary urgency (a sudden and strong urge to urinate) and mixed urinary incontinence combines both stress and urge incontinence, concomitantly [1].
According to the World Health Organization, urinary incontinence affects more than 200 million people worldwide [2,3] being more prevalent in women [4].One in four women will be incontinent at some point in life [4,5].The high prevalence of urinary incontinence concerns government institutions, as the costs related to urinary incontinence care are high, varying from around 117 million and $66 billion (2007 US dollars) per year in the United Kingdom [6] and the United States of America [7], respectively.The consequences of urinary incontinence are associated with impairment of social, psychological, financial, and sexual aspects of a woman's life.This in turn can be related to reduced quality of life [8], self-esteem, and social isolation [9].Moreover, urinary incontinence is a predictor of mortality, especially among the elderly [10].
Patient-reported outcome measures and voiding diaries are used to measure the quality of life of patients with urinary incontinence, as well as to quantify urinary loss.In both clinical practice and research, patientreported outcome measures are useful for reporting the effects of interventions since they take into consideration the patients' perspective regarding the changes observed after the treatment.However, the interpretation of scientific research results in general looks mainly at the interpretation of statistical analyses, that is, whether the result of any intervention may or may not be considered statistically significant [11].The sole interpretation of the "p" values is insufficient to demonstrate the impact of the intervention on the health care of individuals [12,13], as sometimes the research findings may be statistically significant but cannot be considered clinically relevant, as the patient did not have a clinically significant improvement [14].
The analysis of clinical significance has increasingly been used in health research, enabling it to attest to whether the result from a treatment is perceived as beneficial by the patient or any stakeholder's perspective [15].One of the methods used to help with the interpretation of the clinical relevance of research results is the use of the minimal important difference of clinical outcome measures.The minimal important difference has been defined as "the smallest difference in score in the domain of interest that patients perceive as important, either beneficial or harmful, and which would lead the clinician to consider a change in the patient's management'' [16].
There are two different methods to determine the minimal important difference: [17] (1) Distribution methods use statistical calculations based on the distribution of outcomes scores to determine how the scores differ among patients [18].Although these methods are easily applied, they do not evaluate the clinical relevance of the intervention according to the patient's perception [16].(2) Anchor-based methods take into consideration patients' perceptions by using interpretive and selfreported tools such as the global rating of change scale [19][20][21][22] for assessing change in the outcome, which represents a meaningful degree of change [23].In this case, the patient has the autonomy to add a numerical value to the status of the main complaint, considering their perception.Psychosocial factors, for example, could potentially influence the patient's global status, which may interfere with the variable of interest [16].
Previous systematic reviews have assessed the minimal important difference for outcomes related to the musculoskeletal [24][25][26] and oncological [27] areas but none of them have focused on evaluating minimal important difference for outcomes related to urinary incontinence, which has a negative impact on this research field, as it impairs the estimation of sample sizes and the interpretation of the results of clinical trials.This lack in the literature may directly affect the over-or underestimation of the clinical significance of studies that have already been published or will be in the future.In addition, the lack of clear guidance on how to interpret the clinical relevance of results from urinary incontinence outcomes does not contribute to evidence-based practice [28].Synthesizing the evidence about the clinical relevance of instruments related to urinary incontinence may benefit clinicians and researchers, [29] improving decision-making, by informing the minimal important difference of specific instruments, which may be listed in clinical and scientific practice [30].
Therefore, the aims of the present systematic review were: I) to identify and synthesize all distribution-based and anchor-based methods to estimate minimal important difference for outcome measures related to urinary incontinence; II) to summarize minimal important difference estimates related to the most commonly used outcome measures related to urinary incontinence; III) to determine the credibility of minimal important difference reported in each study.

Methods
This is a systematic review conducted according to the PRISMA [31] and COnsensus-based Standards for the selection of health Measurement INstruments [32] guidelines and registered in PROSPERO (protocol CRD42022299686).

Eligibility criteria, information sources, search strategy
The inclusion and exclusion criteria were based and adapted according to the PICOs and COSMIN frameworks, as described below: Population: Women older than 18 years old, with stress, urge and/or mixed urinary incontinence according to International Continence Society definitions(1); with diagnostic of urinary incontinence according to the results of a subjective or objective assessment.Studies were excluded if the aim was to analyze urinary symptoms of children or men; if they included only continent women and/or if authors analyzed only other pelvic floor dysfunctions (i.e., fecal and/or anal incontinence, pelvic organ prolapse, sexual dysfunctions).
Intervention/Instruments of interest (construct targeted): Studies were included if they assessed any outcome measure related to urinary incontinence, such as quality of life and/or amount of leakage.We also looked for outcomes that assessed pelvic floor muscles function evaluated through by questionnaires or physical tests that include vaginal palpation, dynamometry, vaginal cones, manometry, electromyography, imaging exams, urodynamic and/or urine stream interruption test [33].However, no studies were found during screening.
Comparison: Not applicable.
Outcomes: Studies that reported minimal important differences that could be derived from distribution-or anchor-based methods as described in a previous study [17] were included.A detailed description of the methods available to determine minimal important difference in clinical research are presented in Appendix 1.

Study design:
Any study generating minimal important differences for urinary incontinence outcomes (randomized control trials and controlled trials, secondary analysis of clinical trials, cohort studies, cross-sectional studies, reliability, responsiveness, and validity studies) were included.The following types of studies were excluded: case reports, reviews, systematic reviews, meta-analyses, commentaries, letters to the editor, conference papers, books chapter, protocol registration, abstracts without full text, and experimental studies.Reviews were carefully looked for relevant references.
Searches were performed in June 09 2021 and updated in January 09 2024, including the main terms for urinary incontinence and minimal important difference.In addition, a search filter focusing on clinical significance keywords obtained from previous publications was used [34] (details available in Appendix 2).Five databases were consulted: Medline (Ovid MEDLINE(R) ALL), Embase (Ovid interface), CINAHL PLUS with Full text (EBSSCOhost interface), Web of Science (Indexes=SCI-EXPANDED, SSCI, A&HCI, ESCI) and Scopus.No limits were applied for the date, language, or publication status.A manual search was performed to look for relevant references.Included studies were tracked with the web of Sciences database.

Study selection
Results from searchers were compiled into ENDNOTE software and imported to Covidence (www.covid ence.org), which was used during the screening process.Two independent researchers evaluated the studies' eligibility according to the inclusion and exclusion criteria in two sequential evaluation phases: (I) analysis of titles and abstracts; and (II) analysis of full texts.In case of disagreement, a consensus meeting was performed.In any case of continuous discrepancy, a third evaluator makes the final decision.The PRISMA flowchart [35] was provided with the results of the selection process.

Data extraction
An Excel form was developed for data extraction.Pilot testing and regular revision through discussions were taken to standardize the data extraction form and process.One researcher conducted the data extraction and organized the data on the Excel form and a second researcher reviewed the extracted data for accuracy and completeness.Disagreements were solved in consensus meetings.
Data extracted was based on characteristics that include, but were not limited to: 1) article information (first author, year of publication, language, funding, country, aims, study design, and setting); 2) population information (age, diagnosis, tool for the diagnosis and other conditions or characteristics); 3) outcome measurements (minimal important difference determination (e.g.analytical approach, sample size, duration of follow-up when applicable); minimal important difference estimation methods (distribution-and/or anchor-based; the specific anchor applied during data collection, minimal important difference values); constructs evaluated (e.g.quality of life evaluated according to patient-reported outcome measures, pelvic floor function, urinary loss); tool description (categorical, ordinal, or numerical data); type of outcome (patient-reported outcome measures or physical test)); 4) summary of results (minimal important difference estimation, correlations between the outcome and anchor, precision of the minimal important difference (e.g.95% confidence interval/ minimal important difference *100), time between baseline and follow-up, directions of both anchor and patient-reported outcome measures (e.g., if the increase of scores of both instruments reflect an improvement, worsened, or if the scores from both instruments have opposite meaning), correlations of the patient-reported outcome measures and the transition item during baseline and follow-up).In case of missing quantitative data, the authors of the primary studies were contacted in order to get unreported data.When the authors did not answer our request, data were extracted from the graphs available in the studies.

Credibility of minimal important difference estimates
Two independent researchers conducted the credibility assessment of the minimal important difference in each included study that used anchor-based methods.As far as the authors' knowledge, there is no specific tool to assess the credibility of minimal important differences reported according to distribution-based methods.The credibility was evaluated separately for each minimal important difference by two assessors and the final assessment was determined after a consensus meeting between the two reviewers.The instrument developed by Devji et al. [34] for this specific purpose was used under license authorization from McMaster University, as it is the only published tool created for evaluating the credibility of the minimal important difference generated by anchor-based methods.It is composed of 1) a core criterion with five items related to anchor-based methods, and 2) four items related to the transition rating anchors.The first item has a dichotomic yes/no response option, however, the other items from the instrument are composed by a five-point scale with the following response options: definitely yes, to a great extent, not so much, definitely no, or impossible to tell.
There is no specific guidance on how to summarize different domains of this tool as a final assessment of the credibility of the minimal important difference.Therefore, the final assessment for each minimal important difference was defined according to previous decision rules prepared by the team, to create three different categories of credibility: these were based on similar decision rules used when implementing the Cochrane risk of bias (RoB2) tool for randomized controlled trials.Three different categories were created to determine the final assessment of minimal important difference credibility as follows: 1) Low credibility: when most part or one of the items was scored with a negative answer (i.e., not so much or definitely no); 2) Some concerns: when no negative answers were assessed, and the rest of the questions were assessed as "impossible to tell"; 3) High credibility: when all the questions were assessed with a positive answer (i.e., to a great extent or definitely yes).

Data synthesis
The findings of this review were described in a narrative (descriptive) synthesis, organized in evidence tables that compiled study details, results, and data analysis.Data synthesis was performed according to the patientreported outcome measures reported by the authors and the method of calculation for providing the minimal important difference.Minimal important difference provided by distribution-based methods were analyzed separately according to the type of calculation (i.e., effect size, standardized response mean, standard error of measurement, standard deviation) and time range of reevaluation (e.g., 6 weeks, 12 weeks, 12 months).minimal important difference provided by anchor-based methods were performed following guidance from a previous systematic review about minimal important difference [26].
The absolute minimal important difference (mean difference associated with minimum improvement) was calculated for each study separately by checking the original papers and by extracting the mean change of the group of participants that reported a slight improvement, according to the anchor applied during data collection.
After data synthesis, we planned to plot all minimal important difference estimates based on anchor methods together by triangulation, in order to define a single value for each instrument included in the present review, considering that we would find evidence from multiple studies.However, the primary studies presented a high heterogeneity considering patient-reported outcome measures, anchors, and population characteristics, which violated the recommendations to perform the triangulation [36].Also, a meta-analysis was not possible to conduct because of insufficient data.

Quality of evidence
The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) [37] approach was applied in order to assess the overall certainty of the evidence and to grade the strength of recommendations from minimal important differences reported according to anchor-based methods.This assessment was based on the credibility of the minimal important difference (that was analog to the risk bias of studies), inconsistency, indirectness, imprecision, and publication bias.We reported GRADE following previous recommendations on how to rate the certainly of evidence in the absence of pooled results and meta-analysis [38].
The level of evidence was downgraded for inconsistency and/or indirectness in cases where: minimal important differences from patient-reported outcome measures were reported by a single study; different anchors were applied in order to calculate the minimal important difference, studies included different population diagnoses or time-points when the minimal important differences were calculated; studies used different levels of improvement to determine the minimal important difference (minimal, moderate, or strong) when conducting their analysis.The imprecision was downgraded when the total sample size population was less than 300 participants.
The final rating of the studies was classified as high, moderate, low, or very low certainty of evidence [37].

Study selection
A total of 1,662 papers were found through the database search, 719 references were duplicated, so the final number of studies included in the data screening was 943.According to the screening of titles and abstracts, 54 potential studies were selected for full-text review and 10 studies met the inclusion criteria [39][40][41][42][43][44][45][46][47][48].Reasons for exclusion are available in the PRISMA flowchart (Fig. 1) and details of exclusions are provided in Appendix 3.After the manual search, two additional studies were included [49,50].Therefore, 12 studies were analyzed.

Analysis of credibility
Ten studies [39-43, 45, 46, 48-50] determined minimal important differences of several patient-reported outcome measures using anchor-based methods and provided 78 different minimal important differences.Therefore, we performed one evaluation for each minimal important difference separately, resulting in 78 credibility assessments.All reports related to minimal important differences according to anchor-based methods presented low credibility.More details about the scores of the credibility tool are reported in Appendix 4.
In most cases (n=78), the studies met the first criterion of the tool, that assesses if participants responded to the patient-reported outcome measures and the anchor directly.Moreover, anchors used during data collection were considered understandable (second criteria) in 75 cases.
In 24 derived minimal important difference calculations, the correlation between the patient-reported outcome measures and the anchor was not reported (third criteria), although most authors mentioned a general correlation of ≥0.3 between the instruments (n=52).Similarly, most authors failed to meet the fourth criteria of the tool that measured the precision estimate of the minimal important difference (n=61; 78.2%).In 42 cases, the criterion applied by the anchor did not reflect a small but important difference between the health status of the patients, which contradicts the definition of the minimal important difference.
For 63 minimal important difference estimates, the range of time between the first and the second assessments was considered long (more than two or three months); which is the sixth criteria.This can likely be linked to recall bias (i.e., biased perception of the actual health (34)) and difficulty in assessing the previous health status [34].The correlation between the transition score and the prescore and postscore on the target instrument (seventh and eighth criteria) was reported only in few estimates in three different studies [42,43,46].
The risk of bias graph and the summary results are presented in Appendix 5 and 6, respectively.

Synthesis of results
All minimal important difference estimates were provided for 13 different patient-reported outcome measures.Although we targeted several types of outcomes in this review, no study reported minimal important difference estimates for physical assessment of pelvic floor muscles' function, for example.Some authors also provided the minimally important difference for subscales of patient-reported outcome measures.This was the case for the Incontinence Quality of Life (I-QOL): Avoidance and Limiting Behavior, Psychosocial Impacts and Social Embarrassment domains [40]; Pelvic Floor Impact Questionnaire (PFIQ) -UIQ subscale; Pelvic Floor Distress Inventory (PFDI) -general score for UDI [43], and stress and irritative subscales [41]; Overactive Bladder Questionnaire (OAB-q) -Symptom Severity subscore [42]; the Australian Pelvic Floor Questionnaire -Bladder and global score [49]; and the International Consultation on Incontinence Questionnaire -Female Lower Urinary Tract Symptoms (ICIQ-FLUTS) -incontinence domain [50].
Ten different subjective and objective anchors were found among the studies.The Patient Global Impression of Improvement also known as the Global Rating Scale was the most used, followed by the voiding diary, satisfaction with the treatment, and the pad test.
Table 2 describes the main details regarding the population, the patient-reported outcome measures, anchors, data analysis, and conclusions reported by the included studies.Although one study reported minimal important differences according to anchor methods for the Michigan Incontinence Symptom Index (M-ISI) [44], results were not considered in the present review because the statistical method applied by the authors was not clear in the manuscript, and the authors did not respond our e-mail.Appendix 7 provides details about the methods and concepts used to provide minimal important differences using anchor-based methods.Appendix 8 presents a matrix table with a compilation of the minimal important differences extracted from the primary studies according to the distribution and anchor-based methods.
Tables 3 and 4 provide the qualitative data extracted from the studies that reported minimal important differences according to distribution-and anchor-based methods, respectively.Minimal important difference estimates for distribution-based methods represent the "points" for each patient-reported outcome measure.Three main distribution-based analyses were used by the included studies: effect size, standardized response mean, and standard error of measurement.For minimal important difference reported according to anchor method, it was reported by different estimates, including the mean, standard deviation, and absolute value, followed by the 95% confidence intervals and minimum-maximum values for the specific patient-reported outcome measures.Time points (follow-up) were different between studies (6, 10, 12, 14   weeks; and 4, 8, 12 and 12 months).In addition, there was a lack of clarity regarding the time point in four primary studies [42,44,46,47].Table 4 also shows the level of improvement considered by the authors when calculating the minimally important differences by anchor-based methods according to different symbols.Although different patient-reported outcome measures and anchors were applied, most of the studies did not consider the smallest difference identified by the participants to calculate the minimal important difference.The most used level to generate the minimal important difference was moderate to strong improvement.
Figure 2 provides the minimal important difference estimates ranging from 0 to 10 points in their respective patient-reported outcome measures from included studies, considering the score of the patient-reported outcome measures related to the smallest improvement of UI. Figure 3 presents minimal important differences which had a higher range of scores in the patient-reported outcome measures (-150 to +150).

Certainty of evidence
All the minimal important differences reported by anchor-based methods were considered with very low quality of evidence.For more details about GRADE, please check Appendix 9.
All studies [39-43, 45, 46, 48-50] presented very serious concerns about the risk of bias, which means that they presented low credibility in calculating and reporting the minimal important difference according to anchor-based methods.There was also serious and very serious inconsistency in the studies.

Low
Publication bias was not considered for this systematic review since the search process was comprehensive and exhaustive.

Discussion
We included 12 studies that reported minimal important differences in outcome measures used when managing female urinary incontinence, with high variability in methods and values.The minimal important differences from thirteen different patient-reported outcome measures were reported, most of time according to anchor-based methods, using ten different anchors.However, all studies with anchor-based methods presented a low credibility and very low overall certainty.Also, minimally important differences values seem to change according to the time points that are used to generate the minimally important differences (i.e., follow-up of 4 or 6 weeks, 12 and 24 months), the Similar to a previous review [51], minimal important differences provided by distribution based-methods were smaller than the ones provided by anchor based-methods, which could possibly suggest that a smaller change is necessary to represent a clinically significant difference [52].It is known that distribution based-methods only consider the distribution of the scores on their calculations and they are usually related to the variation/change that was observed in a standardized way around the mean.For this reason, previous literature suggested that anchor-based methods should be preferred over distribution-based methods [17].
A possible explanation for the wide variability around these minimal important differences may be related to the level of improvement of patients considered during data analysis.Although some authors already hypothesized that there is neither consensus nor evidence about what is the best criteria to determine the minimal important difference using anchor based-methods [17,53], it should be pointed out that calculations that include groups of participants who considered themselves to have improved moderately or greatly after an intervention could lead to different minimal important differences estimations and it does not follow the original concept of minimal important difference that includes the "smallest difference" in scores that the individuals consider to be beneficial [54].In the present systematic review, the majority of studies did not consider the smallest change of improvement (as perceived by the patients) in their calculations, so future studies could be biased if they consider these values in the estimation of their sample size, or even on interpreting their results.Halme et al. [55] published a study that compiled estimations for calculating sample sizes of trials to treat female urinary incontinence according to minimal important differences.In their statistical analysis, the authors included participants that reported a "very much better" improvement after treatment, which does not represent the smallest difference perceived by the patient.
Previous studies [26,53] recognized the need of validating studies for anchors that are commonly used for data collection about the perception of patients regarding a treatment.Furthermore, there is a need for standardizing the procedures to assess important changes for the patient, by establishing a valid and specific question for that.The lack of validation a standardizing implies a variability in the results, due to the application of different anchors to calculate minimal important differences [53], generating inconsistency between studies that assess minimal important differences.
The literature suggests that anchors should be selected based on it´s relevance and should lay proximal to the construct assessed by the patient-reported outcome measures, which is usually analyzed by the correlation between the tools (anchor and patient-reported outcome measures).Also, researchers and clinicals should consider the characteristics of the sample and severity of the disease in order to define the adequate anchor.In addition, this rationale should be based on previous guidance and scientific evidence [29].A previous study also found that derived minimal important differences are highly variable due to the discrepancy in study designs, methods, and concepts used when calculating the minimal important differences [26].These results agree with the present review.
The newly developed tool used to assess the credibility of the derived minimal important differences according to anchor-based methods showed that the studies presented low credibility.Most studies did not report a pre-requisite of minimal important differences calculation, which is the correlation between the patientreported outcome measures and the anchor.In addition, only three studies [42,43,46] reported the correlations between anchors and patient-reported outcome measure scores during follow-up.This missing information could also help to explain the variability found from the minimal important difference values [53].Considering that anchor and patient-reported outcome measures should be measured in the same or similar underlying constructs, correlations between tools show that both tools are closely linked.Therefore, anchors with absence or low correlation will provide inaccurate minimal important difference estimates [34].
Attention should be drawn to methodological issues related to the calculations and reports of minimally important differences while interpreting the results reported by the literature.It is important to evaluate the credibility of minimal important difference since there is a substantial misunderstanding of methods and concepts that can lead to incorrect reporting of minimal important difference values.Authors should follow some guidance while conducting studies with this aim.This information could be found in previous studies [17] and also by interpreting and incorporating the items assessed by the credibility tool [34] in future studies.
This review contributes substantially to Women's Health research.A summary of the minimal important differences for outcomes related to urinary symptoms in the literature may contribute to evidence-based practice, by complementing statistical results with clinicians' clinical experience and patients' perception of a treatment [17,28].It may result in a new direction for the treatment of urinary symptoms since it brings a focus to interventions that are clinically relevant and can be successfully implemented in clinical practice.Moreover, a new interpretation of results from the literature may be incorporated, as we bring to focus the estimates that might be used to classify results from studies as clinically relevant, not only with statistical power.It may highlight in previous studies that an over-or underestimation could possibly have occurred in the past by interpreting only results from statistical analysis.In addition, our results could facilitate the design and planning of future studies such as generating accurate sample size calculations, determining best outcome measures, and therefore, facilitating the future update of clinical research into practice.Therefore, researchers are encouraged to incorporate these outcomes in their clinical studies to measure the effectiveness of interventions, taking into consideration not only statistical significance but also clinical relevance.
This systematic review followed a rigorously methodological sequence which included the preparation and registration of a protocol for the review, and a systematic search of the most important databases.The eligibility, data extraction, and credibility of the studies were performed by two independent researchers.Moreover, the present review only included studies that reported minimal important differences according to analysis that are already recommended by previous guidelines.We reported which tools already have a minimal important difference that is available to be used in clinical research.In addition, we synthesized the steps and information that are necessary to calculate and analyze the minimal important difference, besides the guidance to help researchers to interpret it correctly.Furthermore, some limitations and misconceptions related to minimal important differences raised from the results of the present review were emphasized.
The present systematic review has some limitations.The limited number of studies included did not allow us to perform sub-analysis according to the type of urinary incontinence, methods of calculation (i.e., distribution or anchor-based method), and/or anchors used during data analysis.Moreover, it was not possible to assess the credibility of studies that reported minimal important differences according to distribution-based methods, as the tool described by Devji et al. [34] was developed to evaluate studies that reported minimal important differences by anchor-based methods (which is the most accepted method to generate minimal important differences).In addition, although guidance exists on how to apply the tool, some clarity was needed on some specific points, especially when deriving a final assessment.Authors from the present review agreed on decision rules to assess the credibility of the minimally important differences derived in the analyzed studies.These decision rules might be considered arbitrary; however, they were based on similar decision rules done in the context of RoB assessment of RCTs.
Although we provide minimal important differences derived by anchor based-methods according to the smallest improvement based on the mean change, our analysis was restricted to the availability of data reported by the studies, such as the scores of patient-reported outcome measures of the group of patients who considered themselves "a little better".In cases where data was not available, the calculation was not possible, which limited the information reported in our review.
We planned to triangulate minimal important differences derived from the same patient-reported outcome measures, considering the method of calculation (i.e., distribution or anchor based-method) and/or anchors used during data analysis.However, regarding the variability among the studies, it was not possible to calculate one single value of minimal important difference for each patient-reported outcome measure.This is a common limitation among systematic reviews that try to compile minimal important differences available for different patient-reported outcome measures [26,56].Previous reports 39,58,64,6 concluded that minimal important differences could not be interpreted as a constant characteristic and a universally empirical score could not be derived.Instead, it is recommended that minimal important difference is analyzed and considered according to the severity of the condition during the baseline, the type of treatment, the units of the patient-reported outcome measures, the conditions of the population, and the context where the patient is located [29,51,56,57].In addition, it seems that minimal important differences can also change according to the different characteristics of the population [53].That was also the case in the present study, as it was also possible to notice that minimal important differences from a population with urgency urinary incontinence [42] were different for the same patient-reported outcome measures in a sample with stress urinary incontinence [41].Therefore, authors should be aware to include these characteristics in their reports about minimal important differences.
Moreover, our study did not explore the factors that could lead to the variability among minimal important differences reported by the authors through sensitivity analysis due to the limited number of studies.Future studies should perform specific statistical analysis to identify which are the factors that could be associated with this variability in order to reduce the disparity and variability among studies.In addition, future studies should be aware of the recommendations regarding the reports that include minimal important differences and should report: 1) the scores from the baseline and follow-up, in order to enable future explorations, even considering the variability among studies [26]; 2) improve the reports regarding the correlations found between anchors and patient-reported outcome measures, during baseline and follow-up; 3) conduct studies that aim to validate anchors often used in studies of Women's Health.
Twelve different patient-reported outcome measures with respective minimal important differences for outcomes related to urinary incontinence were found in the literature, considering 48 and 65 minimal important differences reported according to distribution-and anchor-based methods, respectively.Values based on distribution-based methods were smaller than the anchorbased method.However, the credibility and certainty of evidence of all the minimal important differences related to urinary incontinence measures reported by anchorbased methods were low and very low.The methodology to derive minimal important difference for outcomes related to urinary incontinence need to be improved.

Fig. 2 Fig. 3
Fig. 2 MIDs estimations and 95%CI considering the slight improvement reported by the authors, for MIDs ranging from 0 to 10 points in their respective PROMS.CI: confidence interval; ICIQ-SF: International Consultation on Incontinence Questionnaire -Short Form; I-QOL: Incontinence Quality of Life; MID: minimal important difference; PGI-I: Patient Global Impression of Improvement questionnaire

Table 1
General information of included studies (n=12)

Table 2
Characteristics of primary studies included in this systematic review

Table 2 (
continued) GPI Global Perception of Improvement, ICIQ-FLUTS International Consultation on Incontinence Questionnaire -Female Lower Urinary Tract Symptoms, ICIQ-SF International Consultation on Incontinence Questionnaire -Short Form, ICIQ-LUTSqol ICIQ-Lower Urinary Tract Symptoms Quality of Life, IIQ Incontinence Impact Questionnaire, I-QOL: Incontinence Quality of Life, KHQ King's Health Questionnaire, MID Minimal important difference, M-ISI Michigan Incontinence Symptom Index, MUI Mixed urinary incontinence n: sample size, nº Number, OAB-q Overactive Bladder Questionnaire, PFMT Pelvic floor muscles training, PGI-I Patient Global Impression of Improvement, PSQ Patient Satisfaction Questionnaire, POP Prolapse organ pelvic, PROM Patient-reported outcome measure, ROC Receiver operating characteristic, SUI Stress urinary incontinence, UDI Urogenital Distress Inventory, UDI-stress Urogenital Distress Inventory, stress symptoms subscale, UIQ Urinary Impact Questionnaire, UUI Urgency urinary incontinence, VAS Visual analogue scale

Table 3
Quantitative results from the studies included in the present systematic review, according to distribution-based methods.
MID Minimal important difference, NA Not applicable, SUI Stress urinary incontinence; PROM Patient-reported outcome measure; UUI Urgency urinary incontinence; -No quantitative estimate was provided a the effect size represents the standardized change of the score at the target instrument.It can be classified in small, medium, and large effect sizes considering 0.20, 0.50, and 0.80, respectively b values presented in this table are related to the MID reported in points, according to each specific PROM (questionnaire) c MID (95%CI)

Table 4
Quantitative results from the studies included in the present systematic review, according to anchor-based methods